186 research outputs found
Facial feature point extraction method based on combination of shape extraction and pattern matching
Comparison between Constrained Mutual Subspace Method and Orthogonal Mutual Subspace Method – From the viewpoint of orthogonalization of subspaces –
This paper compares the performances between constrained mutual subspace method (CMSM), orthogonalmutual subspace method (OMSM), and also between their nonlinear extensions, namely kernel CMSM(KCMSM) and kernel OMSM (KOMSM). Although the princeples of the feature extraction in these methods aredifferent, their effectiveness are commonly derived from the orthogonalization of subspace, which is widely used tomeasure the performance of subspace-based methods. CMSM makes the relation between class subspaces similarto orthogonal relation by projecting the class subspaces onto the generalized difference subspaces. KCMSM is alsobased on this projection in the nonlinear feature space. On the other hand, OMSM orthogonalizes class subspacesdirectly by whitening the distribution of the class subspaces. KOMSM also utilizes this orthogonalization method inthe nonlinear feature space. From the experimental results, the performances of both the kernel methods (KCMSMand KOMSM) are found to be very high as compared to their linear methods (CMSM and OMSM) and theirperformances levels are well in the same order in spite of their different principles of orthogonalization
Hand-Shape Recognition Using the Distributions of Multi-Viewpoint Image Sets
This paper proposes a method for recognizing hand-shapes by using multi-viewpoint image sets. The recognition of a hand-shape is a difficult problem, as appearance of the hand changes largely depending on viewpoint, illumination conditions and individual characteristics. To overcome this problem, we apply the Kernel Orthogonal Mutual Subspace Method (KOMSM) to shift-invariance features obtained from multi-viewpoint images of a hand. When applying KOMSM to hand recognition with a lot of learning images from each class, it is necessary to consider how to run the KOMSM with heavy computational cost due to the kernel trick technique. We propose a new method that can drastically reduce the computational cost of KOMSM by adopting centroids and the number of images belonging to the centroids, which are obtained by using k-means clustering. The validity of the proposed method is demonstrated through evaluation experiments using multi-viewpoint image sets of 30 classes of hand-shapes
Time-series Anomaly Detection based on Difference Subspace between Signal Subspaces
This paper proposes a new method for anomaly detection in time-series data by
incorporating the concept of difference subspace into the singular spectrum
analysis (SSA). The key idea is to monitor slight temporal variations of the
difference subspace between two signal subspaces corresponding to the past and
present time-series data, as anomaly score. It is a natural generalization of
the conventional SSA-based method which measures the minimum angle between the
two signal subspaces as the degree of changes. By replacing the minimum angle
with the difference subspace, our method boosts the performance while using the
SSA-based framework as it can capture the whole structural difference between
the two subspaces in its magnitude and direction. We demonstrate our method's
effectiveness through performance evaluations on public time-series datasets.Comment: 8pages, an acknowledgement was added to v
Controllable Multi-domain Semantic Artwork Synthesis
We present a novel framework for multi-domain synthesis of artwork from
semantic layouts. One of the main limitations of this challenging task is the
lack of publicly available segmentation datasets for art synthesis. To address
this problem, we propose a dataset, which we call ArtSem, that contains 40,000
images of artwork from 4 different domains with their corresponding semantic
label maps. We generate the dataset by first extracting semantic maps from
landscape photography and then propose a conditional Generative Adversarial
Network (GAN)-based approach to generate high-quality artwork from the semantic
maps without necessitating paired training data. Furthermore, we propose an
artwork synthesis model that uses domain-dependent variational encoders for
high-quality multi-domain synthesis. The model is improved and complemented
with a simple but effective normalization method, based on normalizing both the
semantic and style jointly, which we call Spatially STyle-Adaptive
Normalization (SSTAN). In contrast to previous methods that only take semantic
layout as input, our model is able to learn a joint representation of both
style and semantic information, which leads to better generation quality for
synthesizing artistic images. Results indicate that our model learns to
separate the domains in the latent space, and thus, by identifying the
hyperplanes that separate the different domains, we can also perform
fine-grained control of the synthesized artwork. By combining our proposed
dataset and approach, we are able to generate user-controllable artwork that is
of higher quality than existingComment: 15 pages, accepted by CVMJ, to appea
Adaptive occlusion sensitivity analysis for visually explaining video recognition networks
This paper proposes a method for visually explaining the decision-making
process of video recognition networks with a temporal extension of occlusion
sensitivity analysis, called Adaptive Occlusion Sensitivity Analysis (AOSA).
The key idea here is to occlude a specific volume of data by a 3D mask in an
input 3D temporal-spatial data space and then measure the change degree in the
output score. The occluded volume data that produces a larger change degree is
regarded as a more critical element for classification. However, while the
occlusion sensitivity analysis is commonly used to analyze single image
classification, applying this idea to video classification is not so
straightforward as a simple fixed cuboid cannot deal with complicated motions.
To solve this issue, we adaptively set the shape of a 3D occlusion mask while
referring to motions. Our flexible mask adaptation is performed by considering
the temporal continuity and spatial co-occurrence of the optical flows
extracted from the input video data. We further propose a novel method to
reduce the computational cost of the proposed method with the first-order
approximation of the output score with respect to an input video. We
demonstrate the effectiveness of our method through various and extensive
comparisons with the conventional methods in terms of the deletion/insertion
metric and the pointing metric on the UCF101 dataset and the Kinetics-400 and
700 datasets.Comment: 11 page
Discriminant feature extraction by generalized difference subspace
This paper reveals the discriminant ability of the orthogonal projection of data onto a generalized difference subspace (GDS) both theoretically and experimentally. In our previous work, we have demonstrated that GDS projection works as the quasi-orthogonalization of class subspaces. Interestingly, GDS projection also works as a discriminant feature extraction through a similar mechanism to the Fisher discriminant analysis (FDA). A direct proof of the connection between GDS projection and FDA is difficult due to the significant difference in their formulations. To avoid the difficulty, we first introduce geometrical Fisher discriminant analysis (gFDA) based on a simplified Fisher criterion. gFDA can work stably even under few samples, bypassing the small sample size (SSS) problem of FDA. Next, we prove that gFDA is equivalent to GDS projection with a small correction term. This equivalence ensures GDS projection to inherit the discriminant ability from FDA via gFDA. Furthermore, we discuss two useful extensions of these methods, 1) nonlinear extension by kernel trick, 2) the combination of convolutional neural network (CNN) features. The equivalence and the effectiveness of the extensions have been verified through extensive experiments on the extended Yale B+, CMU face database, ALOI, ETH80, MNIST and CIFAR10, focusing on the SSS problem
Resolving Marker Pose Ambiguity by Robust Rotation Averaging with Clique Constraints
Planar markers are useful in robotics and computer vision for mapping and
localisation. Given a detected marker in an image, a frequent task is to
estimate the 6DOF pose of the marker relative to the camera, which is an
instance of planar pose estimation (PPE). Although there are mature techniques,
PPE suffers from a fundamental ambiguity problem, in that there can be more
than one plausible pose solutions for a PPE instance. Especially when
localisation of the marker corners is noisy, it is often difficult to
disambiguate the pose solutions based on reprojection error alone. Previous
methods choose between the possible solutions using a heuristic criteria, or
simply ignore ambiguous markers.
We propose to resolve the ambiguities by examining the consistencies of a set
of markers across multiple views. Our specific contributions include a novel
rotation averaging formulation that incorporates long-range dependencies
between possible marker orientation solutions that arise from PPE ambiguities.
We analyse the combinatorial complexity of the problem, and develop a novel
lifted algorithm to effectively resolve marker pose ambiguities, without
discarding any marker observations. Results on real and synthetic data show
that our method is able to handle highly ambiguous inputs, and provides more
accurate and/or complete marker-based mapping and localisation.Comment: 7 pages, 4 figures, 4 table
- …